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Abstract —Deep convolutional neural networks have led to 
breakthrough results in practical feature extraction applications. 
The mathematical analysis of these networks was pioneered by 
Mallat m. Specifically, Mallat considered so-called scattering 
networks based on identical semi-discrete wavelet frames in 
each network layer, and proved translation-invariance as well 
as deformation stability of the resulting feature extractor. The 
purpose of this paper is to develop Mallat’s theory further 
by allowing for different and, most importantly, general semi¬ 
discrete frames (such as, e.g., Gabor frames, wavelets, curvelets, 
shearlets, ridgelets) in distinct network layers. This allows to 
extract wider classes of features than point singnlarities resolved 
by the wavelet transform. Onr generalized feature extractor is 
proven to be translation-invariant, and we develop deformation 
stability results for a larger class of deformations than those con¬ 
sidered by Mallat. For Mallat’s wavelet-based feature extractor, 
we get rid of a number of technical conditions. The mathematical 
engine behind our results is continuous frame theory, which 
allows us to completely detach the invariance and deformation 
stability proofs from the particular algebraic structure of the 
underlying frames. 

I. Introduction 

A central task in signal classification is feature extraction 
m- Eor example, we may want to detect whether an image 
contains a certain handwritten digit 0. Moreover, this should 
be possible independently of the feature’s spatial (or tempo¬ 
ral) location within the signal, which motivates the use of 
translation-invariant feature extractors. In addition, sticking 
to the example of handwritten digits, we want the feature 
extractor to be robust with respect to different handwriting 
styles. This is typically accounted for by asking for stability 
with respect to non-linear deformations of the feature to be 
extracted. 

Spectacular success in many practical classification tasks 
has been reported for feature extractors generated by deep 
convolutional neural networks a, Q. The mathematical ana¬ 
lysis of such networks was initiated by Mallat in m. Mallat’s 
theory applies to so-called scattering networks, where signals 
are propagated through layers that compute the modulus of 
wavelet coefficients. The resulting feature extractor is pro- 
vably translation-invariant and stable with respect to certain 
non-linear deformations. Moreover, it leads to state-of-the-art 
results in various image classification tasks 0, Q. 

The wavelet transform resolves signal features characterized 
by point singularities, but is not very effective in dealing with 
signals dominated by anisotropic features, such as, e.g., edges 


in images 18]. It thus seems natural to ask whether Mallat’s 
theory on scattering networks can be extended to general 
signal transformations. Moreover, certain audio classification 
problems E) suggest that scattering networks with different 
signal transformations in different layers would be desirable 
in practice. 

Contributions: The goal of this paper is to extend Mal¬ 
lat’s theory to cope with general signal transformations (e.g., 
Gabor frames, wavelets, curvelets, shearlets, ridgelets), as well 
as to allow different signal transformations in different layers 
of the network, all that while retaining translation-invariance 
and deformation stability. Our second major contribution is a 
new deformation stability bound valid for a class of non-linear 
deformations that is wider than that considered by Mallat in 
m. The proofs in m all hinge critically on the wavelet trans¬ 
form’s structural properties, whereas the technical arguments 
in our proofs are completely detached from the particular 
structure of the signal transforms. This leads to simplified 
and shorter proofs for translation-invariance and deformation 
stability. Moreover, in the case of Mallat’s wavelet-based 
feature extractor we show that the admissibility condition for 
the mother wavelet (defined in HI Theorem 2.6]) is not needed. 
The mathematical engine behind our results is the theory of 
continuous frames Eol. 

Notation and preparatory material: The complex conju¬ 
gate of z € C is denoted by z. The Euclidean inner product 
of x,y G is {x,y} := J2i=i with associated norm 
|a;| := y/(x, x). The supremum norm of a matrix M G 
is defined by |M|oo := supj ^ and the supremum norm 

of a tensor T G ’is |Tjoo := sup^^ ^ fc|. We 

write Bii{x) C for the open ball of radius R > f) 
centered at x G The Borel cr-algebra of is denoted 
by B. Eor a B-measurable function / : —>■ C, we write 

/gd fix)dx for the integral of / with respect to Lebesgue 
measure pL- For p G [l,oo), L^(R‘^) denotes the space 
of all B-measurable functions / : —>■ C such that 

ll/llp := (/gd lf(x)lPdxy/P < oo. For f,g G we 

set (/, p) := f{x)g{x)dx. The operator norm of the 
linear bounded operator A : LP(R'^) -a is designated 

by ||A||p_g. Id : LP{R'^) -a Lp{RA‘) stands for the identity 
operator on L^’(R^). For a countably infinite set Q, (L^(R'^))® 
denotes the space of sets s := {foloeO, fa G 
V, e Q, such that IINII := 




We write S(K^) for the Schwartz space, i.e., the space of 
functions / : —>■ C whose derivatives along with the 

function itself are rapidly decaying mu Section 7.3], We 
denote the Fourier transform of / G L^{W^) by /(w) 

Mx, and extend it in the usual way to 
ITT] Theorem 7.9]. The convolution of / G 
and g G is (/ * g){y) := f{x)g{y - x)dx. We 

write Ttf{x) := f{x — t), t G R.'^, for the translation operator, 
and M^f{x) := w G R'^, for the modulation 

operator. Involution is defined by {If){x) := f{—x). We 
denote the gradient of a function / : R"^ —?> C as V/. For 
a vector field u : R'^ —>■ R'^, we write Dv for its Jacobian 
matrix, and D^v for its Jacobian tensor, with associated norms 

||u||oo := sup^gRd |■u(x)|, ||Uz;||oo := sup^gRd |(i7u)(x)|oo, 

and ||i9^u||oo := sup^gRd \{D‘^v){x)\ac,- For a scalar field 
ru : R'^ —>• C, we define the norm HtuHoo := sup^^^Rd |ru(x)|. 

II. Mallat’s wavelet-based eeature extractor 

We set the stage by briefly reviewing Mallat’s construction 
m. The basis for Mallat’s feature extractor is a multi¬ 
stage wavelet filtering technique followed by modulus opera¬ 
tions. The extracted features $m(/) of a signal / G L^(R'^) 
are defined as the set of low-pass filtered functions 

I ■ • • I 1/ * V'AtO I * V'At'") I • • • * V'A('*) I * 4>J, (1) 

labeled by the indices ..., G Aw ■= 

{{j,k) \ j > —J, fc G {1,...,iT}} corresponding to pairs of 
scales and directions. The wavelets {i/'aIagAw the low- 
pass filter (j)j are atoms of a semi-discrete Parseval wavelet 
frame and hence satisfy 

Uj*f\\l+ E IIV^A*/!!^ = 11/11^, v/gl2(R‘'). 

We refer the reader to Appendix lAl for a short review of the 
theory of semi-discrete frames. The architecture corresponding 
to ([T]i, illustrated in Figure [1] is known as scattering network 
ii, and uses the same wavelets {V'AjAeAw every network 
layer. 

It is shown in fT) that the feature extractor in O is 
translation-invariant, in the sense that 

^M{Ttf) = rt$M(/), Vf G R^ V/ G 

where Tt is applied element-wise in Tt^M{f)- Further, it is 
proved in HI that is stable with respect to deformations 
of the form 

Ft fix) := fix - rix)). ( 2 ) 

Specifically, for the normed function space (JTm; II ■ IIjtm) 
defined in (|8]l below, Mallat proved that there exists a constant 
C > 0 such that for all / G Hm and every r G C'^(R‘^,R‘^) 
witlQ IIUtIIoo < ^7 the deformation error satisfies 

III$m(/) - ^’m(^V/)||| < 

C{2-^\\T\\oo + J\\DTU + \\D\\\^)\\f\\H^. 

*It is actually the assumption ||Z)t||oo < rather than ||Dr||cx) < ^ 
as stated in ^ Theorem 2.12], that is needed in ^ Eq. E.31] to establish 
I det(Id — Dt(x))\ > 1 — d||Dr||oo > 1/2. 
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Fig. 1: Scattering network architecture based on wavelet 
filtering. 


III. Generalized feature extractor 

In this section, we describe our generalized feature extractor 
and start by introducing the notion of a frame collection. 

Definition 1. For all n G N, let be a semi-discrete frame 
with frame bounds An, Bn > 0 and atoms {fy }a' gA' Q 
L^(R'^)nL^(R‘^) indexed by a countable set A^. The sequence 
T' := ('l'„)ngN is called a frame collection with frame bounds 
A = inf„gN An and B = sup„gpj B„. 

The elements T'n, n G N, in a frame collection correspond 
to particular layers in the generalized scattering network 
defined below. In Mallat’s construction one atom of the semi¬ 
discrete wavelet frame ^'avk’ namely the low-pass filter (j)j, 
is singled out to generate the output set O of the feature 
extractor $m- We honor Mallat’s terminology and designate 
one of the atoms {/a;,}a;,ga;, of each frame in the 
frame collection 'F as output-generating atom. Note, however, 
that our theory does not require this atom to have low-pass 
characteristics. Specifically, we set := /a* for an arbitrary, 
but fixed A* G A^. From now on, we therefore write 

{'/'rt} U {/a„}a„GA„, := A4{A*}, 

for the atoms of the semi-discrete frame T'„. The reader might 
want to think of the discrete index set A„ as a collection of 
scales, directions, or frequency-shifts. 

Remark 1. Examples of structured frames that satisfy the 
general semi-discrete frame condition (01 and will hence be 
seen, in Theorem 0 to be applicable in the construction of 
generalized feature extractors are, e.g., Gabor frames ns, 
curvelets /m?, shearlets lEl, ridgelets SHUl, MS, and, 
of course, wavelets snu as considered by Mallat in m- 

We now introduce our generalized scattering network. To 
this end, we generalize the multi-stage filtering technique 
underlying Mallat’s scattering network to allow for general 
semi-discrete frames that can, in addition, be different in 
different layers. This requires the definition of a general 
modulus-convolution operator, and of paths on index sets. 

Definition 2. Let 'F = ('F„)„gN be a frame collection with 
atoms {(/)„} U {/A„}A„eA„- For 1 < m < oo, define the 
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Fig. 2; Scattering network architecture based on general multi¬ 
stage filtering Q. The function f (k) is the fc-th atom of the 

•^n 

semi-discrete frame associated with the n-th layer. 


set A™ := Ai X A 2 X • • • X A^- An ordered sequence 
q = (Ai, A 2 ,..., Am) G A™ is called a path. The empty path, 
e := 0, defines the set A)* := {e}. The modulus-convolution 
operator is defined as U : (UA=i ^ —>■ 

t/(A„,/) := C7[A„]/ := |/ * AJ, where A„ G L\W^) n 
are the atoms of the semi-discrete frame T'n associ¬ 
ated with the n-th layer in the network. 

We also need to extend the operator U to paths q G A™ 
and do that according to 

U[q]f ■. = U[X^]---U[X 2 ]U[X,]f 

= I • • • 11/ * All * Asl • • • * /a„|, 

where we set U\e\f = /. Note that the multi-stage 

filtering operation (|4li is well-defined, as ||t/[g ]/||2 < 

(nr=i 11/AnIIi)ll/ll 2 , thanks to Young’s inequality ITS] The¬ 
orem 1.2.12]. Figure I 2 ] illustrates the generalized scattering 
network with different semi-discrete frames in different layers. 

We can now put the pieces together and define the gene¬ 
ralized feature extractor $ii,. 

Definition 3. Let tk = be a frame collection, and 

define Q := lJA=o Given a path q G A", n > 0, we write 
(j)[q\ := fin+i for the output-generating atom of the semi¬ 
discrete frame T'n+i. The feature extractor with respect 
to the frame collection 'k is defined as 

■= {U[q]f * (5) 

IV. Main result 

The main result of this paper is the following theorem, sta¬ 
ting that the feature extractor defined in Q is translation- 
invariant and stable with respect to time-frequency deforma¬ 
tions of the form 

A,1./(A (6) 

The class of deformations we consider is wider than the one in 
Mallat’s theory, who considered translation-like deformations 
of the form f{x — t{x)) only. Modulation-like deformations 
occur, e.g., if we have access only to a band-pass 
version of the signal / G 


Theorem 1. Let be a frame collection with upper frame 
bound B < 1. The feature extractor defined in 0 is 
translation-invariant. Further, for R > Q, define the space of 
R-band-limited functions 

Hr := {/ G L\W^) \ supp(/) C 5^(0)}. 

Then, the feature extractor $$ is stable on Hr with respect 
to non-linear deformations 0, i.e., there exists C > 0 (that 
does not depend on such that for all f G Hr and all 
oj G r G C'^(R‘^,R'^) with ||i2T|loo < it holds 

that 

|||<1>^(/) - $^(A,./)||| < C(i?||r|U + l|w|loo)||/||2. (7) 

The proof of Theorem [T] can be found in Appendix |B] Our 
main result shows that translation-invariance and deformation 
stability are retained for the generalized feature extractor 
The strength of this result derives itself from the fact that 
the only condition on 'k for this to hold is B < 1. This 
condition is easily met by normalizing the frame elements 
accordingly. Such a normalization impacts neither translation- 
invariance nor the constant C in 0 which is seen, in (fl4l i. to 
be independent of T'. All this is thanks to our proof techniques, 
unlike those in HI , being independent of the algebraic structure 
of the underlying frames. This is accomplished through a 
generalization of a Lipschitz-continuity result by Mallat ijT! 
Proposition 2.5] for the feature extractor (stated in Propo- 
sition|2]in Appendix iBli. and by employing a partition of unity 
argument HD for band-limited functions. 

V. Relation to Mallat’s results 

To see how Mallat’s wavelet-based architecture is covered 
by our Theorem [T] simply note that by |[T1 Eq. 2.7] the atoms 
{4>j} U {V’AlAeAw satisfy ( fTOl i with A = B = 1. Since 
Mallat’s construction uses the same wavelet frame in each 
layer, this trivially implies sup„gpji?„ < 1. 

Mallat imposes additional technical conditions on the atoms 
{4>j} U {4’\}xeAw’ oil® of which is the so-called scattering 
admissibility condition for the mother wavelet, defined in |[T1 
Theorem 2.6]. To the best of our knowledge, no wavelet in 
d > 2, satisfying this condition has been reported in the 
literature. 

Mallat’s stability bound 0 applies to signals / G 
satisfying 

00 

II/IIjim ■“ E E l|C7[g]/||2 < 00 . (8) 

m—0 q^AwY^ 

While 0 Section 2.5] cites numerical evidence on 0 being 
finite for a large class of functions / G it seems 

difficult to establish this analytically. 

Finally, the stability bound 0 depends on the parameter J, 
which determines the coarsest scale resolved by the wavelets 
{V'aIagAw For J —^ 00 the term 2 “'^||t||oo vanishes; how¬ 
ever, the term J||i7r||oo tends to infinity. 

Our main result shows that i) the scattering admissibility 
condition in Ql is not needed, ii) instead of the signal class 



characterized by ® our result applies provably to the space 
of i?-band-limited functions Hu, and iii) our deformation 
stability bound (|7]l, when particularized to wavelets, besides 
applying to a wider class of non-linear deformations, namely 
(|6]l instead of dU, is independent of J. 

The proof technique used in HI to establish Q makes heavy 
use of structural specihcs of the atoms {(j)j} U {V'AlveAw’ 
namely isotropic dilations, vanishing moment conditions, and a 
constant number iT G N of directional wavelets across scales. 


Appendix A 
Semi-discrete frames 

This appendix gives a short review of semi-discrete frames 

Col. 

Definition 4. Let {/a}aga C n be a set of 

functions indexed by a countable set A. The set of translated 
and involuted functions 


'f'A — {T'b.^/A}(A,b)eAxR‘^ 

is called a semi-discrete frame, if there exist constants A, B > 
0 such that 


^Il/ll2< Ell/*/^ll2<^ll/ll2 (9) 

AeA 

for all f G The functions {/a}agA tii'e called the 

atoms of the semi-discrete frame T'a- When A = B the semi¬ 
discrete frame is said to be tight. A tight semi-discrete frame 
with frame bound A = 1 is called a semi-discrete Parseval 
frame. 


The frame operator associated with the semi-discrete frame 
T'a is defined in the weak sense by Sa : —?> 

^a/ = (E * ^f>) * f- 

AeA 

5'a is a bounded, positive, and boundedly invertible operator 

cni. 

The reader might want to think of semi-discrete frames as 
shift-invariant frames ll20l . where the translation parameter is 
left unsampled. The discrete index set A typically labels a col¬ 
lection of scales, directions, or frequency-shifts. For instance, 
as illustrated in Section HIl Mallat’s scattering network is 
based on a semi-discrete Parseval frame of directional wavelet 
structure, where the atoms {fj} U {i/'a}agAw are indexed by 
the set Aw = {(j, fc)| j > —J, fc G {1,..., AT}}, labeling a 
collection of scales and directions. 

For shift-invariant frames it is often convenient to work with 
a unitarily equivalent representation of the frame operator. 


Proposition 1. 1 1771 Theorem 5.11] Let Abe a countable index 
set. The functions {/a}agA C L^(JSf) n L‘^(]Sf) are atoms of 
the semi-discrete frame T'a = {’fb//A}(A,b)GAxR‘* with frame 
bounds A,B > Q if and only if 


A < E I/a(w)P ^ a.e. to G 
AeA 



Fig. 3: Frequency plane partitions in induced by atoms 
{/a}agA of semi-discrete tensor wavelets (left), semi-discrete 
curvelets (center), and semi-discrete cone-adapted shearlets 
(right). 


Appendix B 
Proof of Theorem[T] 

We first prove translation-invariance. Fix / G L'^{R.‘^) and 
define C[q\f := U[q]f * 4>[q\, Vg G Q. By Q it follows that 
4 ) 5 , is translation-invariant if and only if 

C[q]{Ttf) = Tt{C[q]f), VfGR^VgGQ. (11) 

Due to C[q]{Ttf) = U[q]{Ttf) * fiq] and 

Tt{C[q]f) = Tt{U[q]f * f[q]) = {Tt{U[q]f)) * f[q], 

dl) holds if U[q]{Ttf) = Tt{U[q]f), Vt G Vg G Q. The 
proof is concluded by noting that U[q] is translation-invariant 
thanks to (d and 

U[Xn]iTtf) = \{TJ) * /aJ = \m * /aJI = T*(C/[A„]/), 

for all t G A„ G U^i ^fc- 

Let us now turn to the proof of deformation stability, which 
is based on two key ingredients, the first being a generalization 
of a Lipschitz-continuity result by Mallat Cl Proposition 2.5]: 

Proposition 2. Let T' be a frame collection with upper frame 
bound B < 1. The feature extractor : L^(R‘^) —>■ 
(L^(R‘^))® is a bounded, Lipschitz-continuous operator with 
Lipschitz constant L = \/B, i.e., 

for all f,hG L‘^(]Sf). 

The proof of Proposition|2]is not given here, as it essentially 
follows that of C] Proposition 2.5] with minor changes. We 
now apply Proposition |2] with h := T^r.w/ and get 

for all / G L^(R'^). Here, we used \fB < 1, due to 77 < 1, 
as well as h = Fr,ujf G 77^(R‘^), which is thanks to 

11^112 = l|J"..^(/)ll 2 = / l/{^ - r{x))\^dx < 2\\f\\l 
JK'i 

obtained through the change of variables u = x — t{x), 
together with 

Q 7 / 

— = I det(Id - Dt{x))\ > 1 - (7||77t||oo > 1/2. (12) 


( 10 ) 





























The inequalities in (fT^ hold thanks to 11211 Corollary 1] and 
||£*t||oo < respectively. The second key ingredient of our 
proof is a partition of unity argument HU for band-limited 
functions used to derive an upper bound on ||/ —TV.oj/lb- We 
first determine a function 7 such that f = f*j for all / G Hji. 
Consider ry G S(]R‘^) such that rj{uj) = 1, Vw G i3i(0). Setting 
7 ( 0 ;) := R'^r]{Rx) yields 7 ( 0 ;) = rj(uj/R). Thus, 7 ( 0 ;) = 1, 
Vw G as well as / = /7 and / = / * 7 for all / G 

Hji. Then, we define the operator ^ L'^(SA), 

= / * 7- Note that is well-defined as 7 G C 

We now get 

\\f-Fr,^fh = \\A^f-Fr,^A^fh 

< \\A^ - Fr,cjA^\\2,2\\f\\2 

for all / G Fiji. In order to bound the norm || —TV.w^ 7 l| 2 , 2 , 

we apply Schur’s Lemma to the integral operator Fr^uiA.y — A.y. 

Schur’s Lemma. i l75] App. I.l] Let k ^ C be a lo¬ 

cally integmble function satisfying sup^g^d /jjd \k{x, w)|da; < 
C and sup^gRd /j^d |fc(a:,M)|du < C. Then, the integral ope¬ 
rator K given by K(f){x) = /^d f{u)k(x, u)dw, is a bounded 
operator from L‘^{W^) to with norm ||itr|| 2,2 < C- 

From the identity 


Fr,u;A^(f){x) = / 7 ( 0 ; - t{x) - u)f{u)du 




it follows that Fr^u>Aj — Aj has the kernel function k(x, u) := 
— t(x) — u) — y(x — u), which is locally integrable 
thanks to 7 G S(R'^) and t G We next use 

a first-order Taylor expansion in order to bound \k{x,u)\. 
To this end, let x,u € K'^, and define : R —>• C, 

as - tT{x) - u) - y{x - u). It 

follows that /i^’“(0) = 0 and = k{x,u). Therefore, 

we have (^/i^’”)(A)dA, Vf G R. 

The special choice f = 1 yields \k{x,u)\ = |/i“’“(l)| < 

/o with 


< {{Vjix - Xt{x) - u),t{x))\ 


+ \2TTui(x)y{x — \t(x) — u)| 
< ||T||oo|V7(a; - At(x) - m)| 
-I- 27r||a;||oo|7(a; - \t{x) - u)| 


Thanks to 7 , V 7 G S(R‘^), and ^l([0, !]) = !< 00 , we can 
apply Fubini’s Theorem to get 


R‘i 


|fc(a;, u)|dM < ||r||oo / / |V 7 (a; — AT(a;) — M)|dudA 

do Jvy- 

-f 27r||a;||oo / / | 7 (a; — AT(a;) — M)|dudA 

do dRd 

< ||r||oo||V7||i -f 27r||a;||oo||7l|i 
= i?||r||oo||V?7||i -f 27r||a;||oo||?7l|i- 


Similarly, we obtain 

f |A:(a;, M)|da; < ||t||oo f [ |V 7 (a; — Ar(x) — rt)|dxdA 

-f 27r||w||oo / / | 7 (a; — AT(a;) — M)|dxdA 

do dRd 

< 2||t||oo||V7||i -f 47r||w||oo||7lli 
= 2i?||T||oo||V77||i -f 47r||w||oo||?7||i 

by the change of variables y = x — Xt{x) — u, together with 

= I det(Id — XDt{x))\ > 1 — Ad||idr||oo > 1/2. (13) 
dec 

The inequalities in (fTSl l hold thanks to ETl Corollary 1], 
||ddr||oo < and A G [0,1]. The proof is completed by 

setting 

C := max{2||V7y||i,47r||?7||i}(i?||r||oo + |lw||oo)- (14) 
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